Overview

Let’s load the Prosper data and take a look at the number row and column.

Row count:

## [1] 113937

Column count:

## [1] 81

There are 113937 listing in the dataset with 81 variables. For the scope of this project, I am going to limit the number of variable. The question is which variables.

Looking at how prosper works[1], I add variables that fits the following criteria:

  1. Basic information, information that a user gives to the site (loan amount, loan category, etc.) when they want to register for a loan.
  2. Credit profile, information that may aid in generating the ‘Prosper rating’, ‘Borrower rate’ and ‘Term’. This can be seen in the loan listing page on Prosper site.
  3. Other information, ‘Prosper Rating’, ‘Borrower rate’, ‘Term’, ‘ListingCreationDate’
## 'data.frame':    113937 obs. of  15 variables:
##  $ DelinquenciesLast7Years : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years: int  0 1 0 0 0 0 0 1 0 0 ...
##  $ DebtToIncomeRatio       : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ BankcardUtilization     : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ RevolvingCreditBalance  : num  0 3989 NA 1444 6193 ...
##  $ DaysWithCreditLine      : num  5128 7161 4839 11928 4266 ...
##  $ InquiriesLast6Months    : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ LoanOriginalAmount      : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ ListingCategory         : Factor w/ 21 levels "Not available",..: 1 3 1 17 3 2 2 3 8 8 ...
##  $ EmploymentStatus        : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ AnnualIncome            : num  37000 73500 25000 34500 115000 ...
##  $ BorrowerRate            : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ Term                    : Factor w/ 3 levels "12","36","60": 2 2 2 2 2 3 2 2 2 2 ...
##  $ ProsperRating           : Factor w/ 7 levels "AA","A","B","C",..: NA 2 NA 2 5 3 6 4 1 1 ...
##  $ ListingCreationDate     : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...

Let’s take a look at the data summary:

##  DelinquenciesLast7Years PublicRecordsLast10Years DebtToIncomeRatio
##  Min.   : 0.000          Min.   : 0.0000          Min.   : 0.000   
##  1st Qu.: 0.000          1st Qu.: 0.0000          1st Qu.: 0.140   
##  Median : 0.000          Median : 0.0000          Median : 0.220   
##  Mean   : 4.155          Mean   : 0.3126          Mean   : 0.276   
##  3rd Qu.: 3.000          3rd Qu.: 0.0000          3rd Qu.: 0.320   
##  Max.   :99.000          Max.   :38.0000          Max.   :10.010   
##  NA's   :990             NA's   :697              NA's   :8554     
##  BankcardUtilization RevolvingCreditBalance DaysWithCreditLine
##  Min.   :0.000       Min.   :      0        Min.   : 1038     
##  1st Qu.:0.310       1st Qu.:   3121        1st Qu.: 5704     
##  Median :0.600       Median :   8549        Median : 7299     
##  Mean   :0.561       Mean   :  17599        Mean   : 7648     
##  3rd Qu.:0.840       3rd Qu.:  19521        3rd Qu.: 9278     
##  Max.   :5.950       Max.   :1435667        Max.   :24900     
##  NA's   :7604        NA's   :7604           NA's   :697       
##  InquiriesLast6Months LoanOriginalAmount           ListingCategory 
##  Min.   :  0.000      Min.   : 1000      Debt consolidation:58308  
##  1st Qu.:  0.000      1st Qu.: 4000      Not available     :16965  
##  Median :  1.000      Median : 6500      Other             :10494  
##  Mean   :  1.435      Mean   : 8337      Home improvement  : 7433  
##  3rd Qu.:  2.000      3rd Qu.:12000      Business          : 7189  
##  Max.   :105.000      Max.   :35000      Auto              : 2572  
##  NA's   :697                             (Other)           :10976  
##       EmploymentStatus  AnnualIncome       BorrowerRate    Term      
##  Employed     :67322   Min.   :       0   Min.   :0.0000   12: 1614  
##  Full-time    :26355   1st Qu.:   38404   1st Qu.:0.1340   36:87778  
##  Self-employed: 6134   Median :   56000   Median :0.1840   60:24545  
##  Not available: 5347   Mean   :   67296   Mean   :0.1928             
##  Other        : 3806   3rd Qu.:   81900   3rd Qu.:0.2500             
##               : 2255   Max.   :21000035   Max.   :0.4975             
##  (Other)      : 2718                                                 
##  ProsperRating                      ListingCreationDate
##  C      :18345   2013-10-02 17:20:16.550000000:     6  
##  B      :15581   2013-08-28 20:31:41.107000000:     4  
##  A      :14551   2013-09-08 09:27:44.853000000:     4  
##  D      :14274   2013-12-06 05:43:13.830000000:     4  
##  E      : 9795   2013-12-06 11:44:58.283000000:     4  
##  (Other):12307   2013-08-21 07:25:22.360000000:     3  
##  NA's   :29084   (Other)                      :113912

Univariate Plots Section

Basic information

Loan amount

Several sharp line on the amount, no surprise here, people tend to borrow in whole numbers. Let’s take a look at the statistic a bit.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

The minimum loan is 1000, with the median of 6500 and mean of 8337.

Let’s see the most common loan amount.

## [1] 4000

Interesting to note that 4000 is the most common amount people borrowed, followed by 10000 and 15000.

The maximum loan requested is 35000. Let’s see how many listing asked for that much.

## [1] 430

Not many, only 430 out all of the observation.

Loan category

##             Not available        Debt consolidation 
##                     16965                     58308 
##          Home improvement                  Business 
##                      7433                      7189 
##             Personal loan               Student use 
##                      2395                       756 
##                      Auto                     Other 
##                      2572                     10494 
##     Baby & Adoption Loans                      Boat 
##                       199                        85 
##       Cosmetic Procedures Engagement Ring Financing 
##                        91                       217 
##               Green Loans        Household Expenses 
##                        59                      1996 
##           Large Purchases            Medical/Dental 
##                       876                      1522 
##                Motorcycle                        RV 
##                       304                        52 
##                     Taxes                  Vacation 
##                       885                       768 
##             Wedding Loans 
##                       771

Most people borrow to consolidate their debts, in total there are 58308 case or about 51.17%.

Employment Status

##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134

Most borrowers are employed. There are 67322 employed borrowers or about 59%.

Annual Income

At binwidth=1000, we can see sharp line around some amount, which make sense, since user tend to input a whole number. The histogram is skewed to the left.

Another look at larger binwidth.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        0    38400    56000    67300    81900 21000000

The median for AnnualIncome is 56000, the mean is 67300. The maximum entry is 21 million (1750000 per month).

Credit profile

Payment history

Most borrower have no deliquencies in the last 7 years or public records in the last 10 years. If I remove the borrower with 0 deliquencies and 0 public records. I got:

Let’s take a look at the statistics for DelinquenciesLast7Years.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   0.000   4.155   3.000  99.000     990
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 76439  3967  2879  3183  2592  1826  1790  1648  1421  1208  1151  1075 
##    12    13    14    15    16    17    18    19    20    21    22    23 
##   982   873   821   795   731   608   574   540   565   472   421   439 
##    24    25    26    27    28    29    30    31    32    33    34    35 
##   423   347   330   317   296   287   248   214   225   190   190   201 
##    36    37    38    39    40    41    42    43    44    45    46    47 
##   147   153   144   148   113   106   128   101   110    81    90    94 
##    48    49    50    51    52    53    54    55    56    57    58    59 
##    78    74    72    72    55    40    40    39    53    30    31    34 
##    60    61    62    63    64    65    66    67    68    69    70    71 
##    41    34    36    31    28    34    27    22    20    20    15    13 
##    72    73    74    75    76    77    78    79    80    81    82    83 
##    14    17     9    22    10    15    10     8    12     4    12     6 
##    84    85    86    87    88    89    90    91    92    93    94    95 
##     8     3     7     7     9     5     7     4     6     2     3     4 
##    96    97    98    99 
##     4     4     3   110

While most borrowers has 0 deliquencies, there are still 3967 borrowers who have at least 1 deliquencies in the last 7 years. And there are also 110 borrowers that have 99 (maximum ) Deliquencies in the last 7 years.

And the statistics for PublicRecordsLast10Years.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0000  0.0000  0.3126  0.0000 38.0000     697
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 85803 22834  3011   894   345   151    70    46    31    15     8     7 
##    12    13    14    15    16    17    20    21    22    25    30    34 
##     4     1     4     3     5     1     1     1     1     1     1     1 
##    38 
##     1

While most borrowers has 0 public records for the last 10 years, there are 22834 borrowers have at least 1 public records in the last 10 years. The maximum public records is 38.

Debt burden

Debt to Income Ratio

A debt income ratio is the percentage of a consumer’s monthly gross income that goes toward paying debts. The data is capped at 10.01, debt-to-income ratio larger then 1000% will be returned as 1001%.

Removing the upper quantile on the data we got:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

The maximum is 10.01. This is specified in the data definition, debt to income ratio is always capped to 10.01 (1001%). The minimum value is 0. With the median of 0.22 and mean of 0.276.

Revolving Credit Balance

Revolving Credit Balance is the total outstanding balance that the borrower owes on open credit cards or other revolving credit accounts.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0    3121    8549   17600   19520 1436000    7604

The median is 8549 and mean 17600. The maximum value is 1436000. The minimum and the most common amount is 0.

Bankcard Utilization

Bankcard utilization is the sum of the balances owed on open bankcards divided by the sum of the card’s credit limits. Lower usually means better.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.310   0.600   0.561   0.840   5.950    7604

There are interestingly 2 peaks in the plot, first there are a lot of borrowers who have almost 0% Bankcard Utilization and then another peak near 100%. There are some borrowers who have utilization > 1.00 (100%).

Number of borrowers with BankcardUtilization < 0.05:

## [1] 9361

Number of borrowers near 1:

## [1] 9532

Number of borrowers with BankcardUtilization >= 1:

## [1] 2574

There are 2574 borrowers who has bankcard utilization > 1. That means they owed more then the credit limit.

Length of credit history

Length of credit history is the number of days from the date when the oldest account on the borrower’s credit record was opened till today.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1038    5704    7299    7648    9278   24900     697

There is a credit line going up to 60 years.

Other information

##    AA     A     B     C     D     E    HR  NA's 
##  5372 14551 15581 18345 14274  9795  6935 29084

The most common rating (excluding the NA’s) is rating C, 18345. Only 5372 listing have AA rating or about 4.71%.

##    12    36    60 
##  1614 87778 24545

Most loans have 36 months term or about 77%.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

The median for the borrower rate is 18.4% and mean 19.28%. The maximum borrower rate is 0.4975 or 49.75%. There are 6 observation that has more then 40% borrower rate.

Univariate Analysis

What is/are the main feature(s) of interest in your dataset?

The main features of the data are:

I chose this variables, because these variable is visible from the UI[1].

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

I added ListingCreationDate. I added it just to see if there is “trend” in the behavior.

Did you create any new variables from existing variables in the dataset?

Yes, Days with credit line.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

All of the money related variables (LoanOriginalAmoun, RevolvingCreditBalance and AnnualIncome) are positively skewed. I do not transform the data for univariate analysis.


Bivariate Plots Section

Let’s see the relationship between LoanOriginalAmount with ListingCategory.

The baby and adoption loans looks similar to debt consolodation. Let’s take a look at the statistics.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    9500    9908   15000   35000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2000    4000    9000    9751   15000   30000

The top one is debt consolidation summary and the bottom one is baby and adoption loans. Very close but not similar.

Interesting to note that wedding loans is quite high. Let’s take a look at the numbers.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2000    4000    7500    8836   13000   35000

The median and mean are 7500 and 8836, with maximum value up to 35000.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    2500    4000    4873    6000   25000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    1600    3000    4089    5000   25000

Interesting to see that not employed is requesting loan higher then part time. The median and the mean of not employed borrower are 4000 and 4873 vs 3000 and 4089 from part time borrower. Both max loans are 25000.

Now let’s look at the relationship between LoanOriginalAmount with AnnualIncome.

Most of the Loan are below 10000 and annual income is under 100000. The quantile shows that the higher the annual income the higher the median of the loan original amount.

The number of the data that have original amount < 10000 and annual income less then 100000 is:

## [1] 65553

which is around 57.53% of the data.

It seems that people who borrow > 25000 has annual income of >= 100000 looks like there some kind of rule, that if you borrow > 25000 the the minimal annual income is 100000.

Let’s verify this a bit.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  100000  115000  137200  160200  175000  800000

And yes the minimum annual income is 100000. Let’s also check the correlation between the 2 variables.

## [1] 0.2012595

This indicate weak positive relationship.

Now let’s check the DebtToIncomeRatio with BorrowerRate.

That is not too informative. Let’s add some quantile information.

The median of the borrower rate gets higher as the debt to income ratio gets higher as well.

Another way to look at this is by breaking the DebtToIncomeRatio into several bins.

A quick look at the newly created variable.

Let’s have another look at the relationship between DebtToIncomeRatio with BorrowerRate.

We can see that the BorrowerRate median increases the higher the DebtToIncomeRatio. Let’s check the correlation between these 2 variables.

## [1] 0.06291678

The correlation is not all that significant, it show no or neglible relationship. This is contrary to my initial assumption. I assume the correlation will be positive.

Seperating the debt to income ratio provide an interesting look into the data, let’s separate the borrower rate into bin as well.

Quick look at the result.

Let’s check the relationship with annual income.

These two plots are the same, the first one using the newly created BorrowerRate bins. Generally these plots shows the more you annual income the less is your borrower rate.

Let’s take a look at the correlation.

## [1] -0.0889818

Again while the plot show some trend, the correlation between the two variables is negligible.

Let’s see the relationship of the borrower rate with other variables.

As BankcardUtilization, DeliquenciesLast7Years and PublicRecordsLast10Years increases so is the borrower rate. On the other hand the lower the RevolvingCreditBalance the lower the BorrowerRate.

Let’s check the correlation.

Correlation between BorrowerRate with BankcardUtilization:

## [1] 0.255482

Correlation between BorrowerRate with RevolvingCreditBalance:

## [1] -0.05960823

Correlation between BorrowerRate with DelinquenciesLast7Years:

## [1] 0.1702787

Correlation between BorrowerRate with PublicRecordsLast10Years:

## [1] 0.1283138

Correlation between BorrowerRate with DaysWithCreditLine:

## [1] -0.0474466

Correlation between BorrowerRate with InquiriesLast6Months:

## [1] 0.18381

BankcardUtilization has a weak positive relationship. The other factor, while showing position relationship for DelinquenciesLast7Years, PublicRecordsLast10Years and negative relationship for RevolvingCreditBalance, has a negligible relationship.

Let’s take a look at the relation of Term with other variables now.

The median for the LoanOriginalAmount increases as the terms get longer. Let’s verify this.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    2000    3500    4694    5000   25000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3000    5000    7276   10000   35000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2000    8000   11500   12370   15000   35000

And entry with term 60 has median of 11500, 5000 for 36 and 3500 for 12.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0   44200   67000   82660   97930 7423000
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        0    36000    54000    65290    80000 21000000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0   45000   64000   73470   90000 1305000

The median for 12 months term is 67000 higher then the 36 months term which is 54000.

The relationship with DebtToIncomeRatio

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0100  0.1100  0.1700  0.2202  0.2800 10.0100     199
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.210   0.282   0.310  10.010    6953
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0100  0.1700  0.2300  0.2565  0.3200 10.0100    1402

The median of the DebtToIncomeRatio increases as the terms goes up. For 60 months term the median is 0.23, for 36 months term the median is 0.21.

Another look, this time using DebtToIncomeRatio.bin.

The term distribution actually is pretty even across the DebtToIncomeRatio.bin. Let’s check the correlation.

## [1] -0.01467005

We see that correlation number is neglible.

Let’s look at borrower rate vs term as well.

This is quite interesting borrower rate above 30% does not have 12 months Term.

Okay we see the same result from the boxplot. Let’s look at the number.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0400  0.0929  0.1434  0.1501  0.2064  0.2669
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1274  0.1815  0.1935  0.2599  0.4975
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0669  0.1490  0.1870  0.1930  0.2319  0.3304

The median BorrowerRate for 60 months term is 0.1870, the highest among the 3.

Let’s check ProsperRating relationship with DebtToIncomeRatio and BorrowerRate now.

The better rating the lower the borrower rate. Let’s verify this.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.04000 0.06990 0.07790 0.07912 0.08450 0.21000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0498  0.0990  0.1119  0.1129  0.1239  0.2150
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0693  0.1414  0.1509  0.1545  0.1639  0.3500
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0895  0.1765  0.1914  0.1944  0.2099  0.3500
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1157  0.2287  0.2492  0.2464  0.2625  0.3500
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1479  0.2712  0.2925  0.2933  0.3149  0.3600
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1779  0.3134  0.3177  0.3173  0.3177  0.3600

The borrower rate median get increases as we move to worse rating.

Okay let’s now check out the Debt to income ratio.

It seems at rating AA, the maximum DebToIncomeRatio is 50%. Let’s inspect the data for this.

##     0-10%    10-20%    20-30%    30-40%    40-50%    50-60%    60-70% 
##      1224      2339      1227       300        38        11         1 
##    70-80%    80-90%   90-100% 100-1000%      NA's 
##         2         1         3         4       222

Our initial thought is not true. Even for AA rating we still have debt to income ratio > 50%. So there are listing where the prosper rating is good but the debt to income ratio is more then 50%.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

I wanted to see how several features affect the borrower rate, term and prosper rating. I put several data into “bins” as this makes it a bit easier to work with. By using this on borrower rate, debt to income ratio and delinquicies observations, we can paint a clearer picture on the relationship between features.

We can see for instance the borrower rate increases as debt to income ratio increases. The term seems to be related with loan original amount, the bigger the amount the longer the term. The borrower rate also shows a slight increase as the delinquincies in last 7 years goes up.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

No.

What was the strongest relationship you found?

If we plot the ProsperRating against other features, the plot became much clearer. For instance we can quickly see that the debt to income ratio for rating AA will be lower then other rating.

The borrower rate shows an even clearer picture. The better your rating the lower your borrower rate. At rating C, the number of listing that have borrower rate between 0 to 10% is only 8.

Term is not affected by the rating, the most common terms is 36 across the ratings.

The debt to income ratio distribution in each rating is also interesting to see. While there is a pattern that shows that the higher the debt to income ratio tend to be more significant in lower rating (HR), there seems to be exception. For instance people with huge debt to income ratio can also have a prosper rating AA, not much 4 out of 5150 almost zero percent.


Multivariate Plots Section

Let’s compare the debt to income ratio again, this time faceted by rating.

From the bivariate analysis, the correlation between DebtToIncomeRatio is not actually that strong. This plot shows that. Most of the DebtToIncomeRatio is clustered under 1. But really the strong relationship is between the rating and borrower rate. In fact, on rating B, D and E, the median for borrower rate decrease as the debt to income ratio raises.

Let’s take a look at the distribution instead.

If we look at the plot above, we see that on rating AA, there is no borrower rate bin larger then 20%. So it does shows the distribution of borrower rate pretty good (better rating, better borrower rate). But if we look at the debt to income ratio bins, we see that they are all over the place. It is true that rating HR you got more borrower with high debt to income ratio, but we also have low debt to income ratio as well.

Next, let’s see if we can add another dimension by using ListingCreationDate.

There is some different from year to year feature distribution within rating. For instance the the borrower rate distribution we can see that the borrower rate for rating AA in 2013 and 2014 almost all between 0-10%. For rating B we seems to have borrower rate of 20-30% in 2011 and 2012, but 10-20% in 2013 and 2014.

Let’s do the same with debt to income ratio.

The distribution shows interesting patterns. The debt to income ratio for each rating, seem to have allow higher debt to income ratio from year to year.


Final Plots and Summary

Plot One

Description One

This plot shows the effect of several factor on ProsperRating. The data for each factor is scaled, grouped by prosper rating and averaged. I use line plot to show how each factor trend for each ProsperRating. Since it is scaled, the point also show how far it is from the mean (the 0 line).

Several things we can see:

  • If a borrower have less bankcard utilization usually, he/she will get a better rating.
  • This is also the case for debt to income ratio, deliquencies last 7 years, inquiries last 6 months and public record last 10 years.
  • On the other hand the longer you have credit line (DaysWithCreditLine) the better.
  • The more you have revolving credit balance the better your rating.
  • For HR rating the most important factor it seems are inquiries for the last 6 months and debt to income ratio, which makes sense, if a borrower is shopping around for credit that does not look good. Other factor seems not to be as important, for instance if we take bankcard utilization it is actually less then rating E or even D. This happens to other factor as well. Days with credit line for HR also shows this behavior it is actually better then R and D and even C.

Plot Two

Description Two

This plot shows the borrower rate distribution for borrower based on ProsperRating. If your rating is AA you will likely get 0-10% borrower rate, if your rating is A, B, or C, you will likely get 10-20%, for D and E it is between 20-30% and 30-40% for HR. Note while most of rating HR has high borrowing rate (30-40%), the are still a few who have 10-20% borrower rate.

Plot Three

Description Three

This plot is another look at plot 2 with added dimension of listing creation date. The plot shows the trend of borrower rate from 2009 and 2014 faceted by ProsperRating. We can see that if a borrower is rated AA in 2009 they can get 10-20% borrower rate. In 2013 and 2014, if you are rated AA you will get 0-10% borrower rate. If you are rated E in 2009 most borrower will get 30-40% rate, but in 2014 you can actually get 20-30% borrower rate. The number of borrower with 20-30% borrower rate also increases significantly for rating E in 2012 onwards. For HR in 2009 and 2010, quite a number of borrower can still have a 10-20% borrower rate. but 2011 onward this no longer possible.

Overall, it seems in 2009, the borrower rate is actually quite high. Even for rating AA a borrower can still get 20-30% borrower rate. If you are rating is E in 2009 you will likely to get 30-40% borrower rate. This no longer happen in 2014, In 2014, AA will most likey give you 0-10% borrower rate. A, B, and C rating will get 10-20%, D and E most likely 20-30%. In 2014 for HR rating you can still get 20-30% borrower rating, in contrast in 2011-2012 this will not be possible.


Reflection

The Prosper data has a lot of variables, for this scope of the project I limited the number of variables to investigate. The first part is to select which variables to investigate. After much thought, I use the variable that a borrower can actually see in the loan listing page[1]. I do this because I assume these are the metric that is important for lender to look at before actually lending money, so it is a good start.

Initially I wanted to show the relationship between the variables with borrower rate, for instance debt to income ratio vs borrower rate, bankcard utilization vs borrower rate. To ease the exploration I have put several variables into “bins”. Putting it into bins makes it easier for me to show the relationships between variables.

It is also much easier to show relationship based on ProsperRating then borrower rate. For instance if we faceted debt to income ratio with ProsperRating, it is easier to see that the lower your debt to income ratio the better is your rating. And then show the better you rating the better is you borrower rate.

Even on this limited number of variables, there is a lot of thing that we can investigate further. One thing we can try to take a look as how to a borrower gets the ProsperRating. What are the make up of ProsperRating. I layout several factor but, those factors are based on Prosper UI[1]. In the data there are also information on borrower credit score. So it is interesting to see what kind of relationship exists between credit score and ProsperRating.

References

[1] https://www.prosper.com/help/topics/how-to-read-a-loan-listing/